A r t i c l e s
Navigation

Note: This site is
a bit older, personal views
may have changed.

M a i n P a g e

D i r e c t o r y

Wrong Way to Talk Regex


Many programmers/users will create a regular expression
on one line and then either:

 -write paragraphs and paragraphs 
  trying to explain what the regular
  expression does, or..
 -not write any description of the
  regex and assume it's better that
  every programmer spend several
  hours decrypting it. 
 -Or maybe just delete it all and 
  never even offer it in the first
  place. That might be more efficient.

Typically this occurs on a web forum when someone
suggests a regex to another programmer for his
problem. They give the regex, and then they explain
it paragraph by paragraph instead of just breaking
it into pieces and explaining it that way. Or they
just give the regex and don't offer any breakdown.

Break it down into pieces like this
 (.*)   //this piece does..
 [a-z]* //this piece does..
 [1-5]* //this piece does..

Then combine it
 (.*)[a-z]*[1-5]*  // this together does

Regular expressions are to be broken down into pieces,
and each piece is discussed with a source code comment.

Stop writing one line regular expressions and trying 
to explain them in essay format to prove that your
regular expression even does something.

Break them down into sections, source code comment them,
and then give the full regular expression on one line.

The problem with regular expressions is that they are
quick and dirty - just because you had full 
concentration and were able to write the regex on one 
line in at that moment, it does not mean you actually 
understand it 6 minutes later. Instead, you could have 
source code commented the regular expression in pieces,
commented it, and it would then be maintainable.

 -The amount of time spent time to describe the regular
  expression as a whole one, is the same or more amount 
  of time than just not using a regular expression in the
  first place, and sticking to long drawn out system
  and parsing functions.  This is the case if you really
  know your programming language well.. 

 -Beginners/Intermediates tend to use regexes often 
  (perl, php people especially) while true hackers write 
  parsers that are more maintainable.

 -The amount of time it takes to decrypt a regular 
  expression forces most people to not reuse regular 
  expressions - people tend to create their own since
  they are like a write-once technology.

 -modifying a regex may end up turning out to be a night
  mare and it may screw up an entire program. Modifying
  a true parser that doesn't use regexes is usually much
  easier. Imagine, for a moment, if the freepascal, 
  visual C++, or Gnu C Compiler was based on regex 
  parsing.. good luck modifying and improving the 
  compiler.

 -One little mistake in a regular expression can bring
  a whole system down, such as an htaccess file in root
  directory. This is not worth leaving un-commented.
  At least carry a descriptor file around if you don't
  want to make the htaccess file to cluttered with 
  comment noise.

By the time one writes paragraphs of explanations about
the regular expression (or even more typically, they don't
even explain what the regular expression does anywhere)
these people could have written a routine programmatically
with normal system and string functions and commented it 
properly.

Regular expressions are a great tool, but don't think you
are elite leaving that htaccess mod_rewrite file un-commented
with regular expressions as the only explanation for
what the .htaccess file does.

What's worse: some languages, i.e. perl, php, force you to
write more than just the regular expression on one line.
They place the modifiers on one line with the regular
expression. As if this little time saver is going to make
life easier in the long term when you are maintaining
the website or program in the future. 

Overall regular expressions and some languages remind me of
selfishness and write-once technology.

Make no mistake, regular expressions are fast, dirty and
quick.. especially if you can read and remember your own.
But you've got to start thinking for the long term.



About
This site is about programming and other things.
_ _ _